AITopics | improved interpretability

Regularizing Black-box Models for Improved Interpretability

Neural Information Processing SystemsDec-24-2025, 04:58:04 GMT

Most of the work on interpretable machine learning has focused on designing either inherently interpretable models, which typically trade-off accuracy for interpretability, or post-hoc explanation systems, whose explanation quality can be unpredictable. Our method, ExpO, is a hybridization of these approaches that regularizes a model for explanation quality at training time. Importantly, these regularizers are differentiable, model agnostic, and require no domain knowledge to define. We demonstrate that post-hoc explanations for ExpO-regularized models have better explanation quality, as measured by the common fidelity and stability metrics. We verify that improving these metrics leads to significantly more useful explanations with a user study on a realistic task.

improved interpretability, name change, regularizing black-box model, (3 more...)

Neural Information Processing Systems

Industry: Transportation > Air (0.44)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.80)

Add feedback

Neural Interaction Transparency (NIT): Disentangling Learned Interactions for Improved Interpretability

Neural Information Processing SystemsNov-20-2025, 22:26:36 GMT

Neural networks are known to model statistical interactions, but they entangle the interactions at intermediate hidden layers for shared representation learning. We propose a framework, Neural Interaction Transparency (NIT), that disentangles the shared learning across different interactions to obtain their intrinsic lower-order and interpretable structure. This is done through a novel regularizer that directly penalizes interaction order. We show that disentangling interactions reduces a feedforward neural network to a generalized additive model with interactions, which can lead to transparent models that perform comparably to the state-of-the-art models. NIT is also flexible and efficient; it can learn generalized additive models with maximum $K$-order interactions by training only $O(1)$ models.

disentangling learned interaction, improved interpretability, neural interaction transparency, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Regularizing Black-box Models for Improved Interpretability

Neural Information Processing SystemsMay-27-2025, 03:38:27 GMT

Most of the work on interpretable machine learning has focused on designing either inherently interpretable models, which typically trade-off accuracy for interpretability, or post-hoc explanation systems, whose explanation quality can be unpredictable. Our method, ExpO, is a hybridization of these approaches that regularizes a model for explanation quality at training time. Importantly, these regularizers are differentiable, model agnostic, and require no domain knowledge to define. We demonstrate that post-hoc explanations for ExpO-regularized models have better explanation quality, as measured by the common fidelity and stability metrics. We verify that improving these metrics leads to significantly more useful explanations with a user study on a realistic task.

artificial intelligence, machine learning, regularizing black-box model, (2 more...)

Neural Information Processing Systems

Industry: Transportation > Air (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.89)

Add feedback

Regularizing Black-box Models for Improved Interpretability

Neural Information Processing SystemsOct-10-2024, 14:09:41 GMT

Most of the work on interpretable machine learning has focused on designing either inherently interpretable models, which typically trade-off accuracy for interpretability, or post-hoc explanation systems, whose explanation quality can be unpredictable. Our method, ExpO, is a hybridization of these approaches that regularizes a model for explanation quality at training time. Importantly, these regularizers are differentiable, model agnostic, and require no domain knowledge to define. We demonstrate that post-hoc explanations for ExpO-regularized models have better explanation quality, as measured by the common fidelity and stability metrics. We verify that improving these metrics leads to significantly more useful explanations with a user study on a realistic task.

explanation quality, improved interpretability, regularizing black-box model

Neural Information Processing Systems

Industry: Transportation > Air (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.89)

Add feedback

Reviews: Neural Interaction Transparency (NIT): Disentangling Learned Interactions for Improved Interpretability

Neural Information Processing SystemsOct-7-2024, 13:53:10 GMT

This paper proposes a novel approach to more interpretable learning in neural networks. In particular, it addresses the common criticism that the computations performed by neural networks are often hard to intuitively interpret, which can be a problem in applications, e.g. in the medical or financial fields. The authors suggest adding a novel regulariser to the weights of first layer of a neural network to discover and preserve non-additive interactions in the data features up to a chosen order and preserve these relationships without entangling them together (e.g. These interactions could then be further processed by the separate columns of the neural network. The approach was evaluated on a number of datasets and seems to perform similarly to the baselines on regression and classification tasks, while being more interpretable and less computationally expensive.

disentangling learned interaction, neural interaction transparency, neural network, (8 more...)

Neural Information Processing Systems

Genre: Research Report (0.57)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.90)

Add feedback

Neural Interaction Transparency (NIT): Disentangling Learned Interactions for Improved Interpretability

Tsang, Michael, Liu, Hanpeng, Purushotham, Sanjay, Murali, Pavankumar, Liu, Yan

Neural Information Processing SystemsFeb-14-2020, 17:27:40 GMT

Neural networks are known to model statistical interactions, but they entangle the interactions at intermediate hidden layers for shared representation learning. We propose a framework, Neural Interaction Transparency (NIT), that disentangles the shared learning across different interactions to obtain their intrinsic lower-order and interpretable structure. This is done through a novel regularizer that directly penalizes interaction order. We show that disentangling interactions reduces a feedforward neural network to a generalized additive model with interactions, which can lead to transparent models that perform comparably to the state-of-the-art models. NIT is also flexible and efficient; it can learn generalized additive models with maximum $K$-order interactions by training only $O(1)$ models.

disentangling learned interaction, interaction, neural interaction transparency, (3 more...)

Neural Information Processing Systems

Genre: Research Report (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Filters

Collaborating Authors

improved interpretability

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Regularizing Black-box Models for Improved Interpretability

Neural Interaction Transparency (NIT): Disentangling Learned Interactions for Improved Interpretability

Regularizing Black-box Models for Improved Interpretability

Regularizing Black-box Models for Improved Interpretability

Reviews: Neural Interaction Transparency (NIT): Disentangling Learned Interactions for Improved Interpretability

Neural Interaction Transparency (NIT): Disentangling Learned Interactions for Improved Interpretability